Dataset statistics
| Number of variables | 11 |
|---|---|
| Number of observations | 900 |
| Missing cells | 881 |
| Missing cells (%) | 8.9% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 212.6 KiB |
| Average record size in memory | 241.9 B |
Variable types
| NUM | 6 |
|---|---|
| CAT | 4 |
| BOOL | 1 |
Reproduction
| Analysis started | 2020-05-10 01:02:58.275800 |
|---|---|
| Analysis finished | 2020-05-10 01:03:04.659849 |
| Version | pandas-profiling v2.6.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
Name has a high cardinality: 899 distinct values | High cardinality |
Birthday_year has 177 (19.7%) missing values | Missing |
Medical_Tent has 702 (78.0%) missing values | Missing |
Family_Case_ID is highly skewed (γ1 = 26.33381889) | Skewed |
Parents or siblings infected has 685 (76.1%) zeros | Zeros |
Wife/Husband or children infected has 614 (68.2%) zeros | Zeros |
Medical_Expenses_Family has 15 (1.7%) zeros | Zeros |
| Distinct count | 900 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 450.5 |
|---|---|
| Minimum | 1 |
| Maximum | 900 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.2 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 45.95 |
| Q1 | 225.75 |
| median | 450.5 |
| Q3 | 675.25 |
| 95-th percentile | 855.05 |
| Maximum | 900 |
| Range | 899 |
| Interquartile range (IQR) | 449.5 |
Descriptive statistics
| Standard deviation | 259.9519186 |
|---|---|
| Coefficient of variation (CV) | 0.5770297861 |
| Kurtosis | -1.2 |
| Mean | 450.5 |
| Median Absolute Deviation (MAD) | 225 |
| Skewness | 0 |
| Sum | 405450 |
| Variance | 67575 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 900.], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 900 | 1 | 0.1% | |
| 337 | 1 | 0.1% | |
| 307 | 1 | 0.1% | |
| 306 | 1 | 0.1% | |
| 305 | 1 | 0.1% | |
| 304 | 1 | 0.1% | |
| 303 | 1 | 0.1% | |
| 302 | 1 | 0.1% | |
| 301 | 1 | 0.1% | |
| 300 | 1 | 0.1% | |
| Other values (890) | 890 | 98.9% |
| Value | Count | Frequency (%) | |
| 1 | 1 | 0.1% | |
| 2 | 1 | 0.1% | |
| 3 | 1 | 0.1% | |
| 4 | 1 | 0.1% | |
| 5 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 900 | 1 | 0.1% | |
| 899 | 1 | 0.1% | |
| 898 | 1 | 0.1% | |
| 897 | 1 | 0.1% | |
| 896 | 1 | 0.1% |
| Distinct count | 675 |
|---|---|
| Unique (%) | 75.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 14305.827777777778 |
|---|---|
| Minimum | 345 |
| Maximum | 742836 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.2 KiB |
Quantile statistics
| Minimum | 345 |
|---|---|
| 5-th percentile | 2741 |
| Q1 | 8203 |
| median | 13593.5 |
| Q3 | 18906.5 |
| 95-th percentile | 23178.25 |
| Maximum | 742836 |
| Range | 742491 |
| Interquartile range (IQR) | 10703.5 |
Descriptive statistics
| Standard deviation | 25418.1539 |
|---|---|
| Coefficient of variation (CV) | 1.776769181 |
| Kurtosis | 753.1134642 |
| Mean | 14305.82778 |
| Median Absolute Deviation (MAD) | 6456.299667 |
| Skewness | 26.33381889 |
| Sum | 12875245 |
| Variance | 646082547.7 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[3.45000e+02 1.78350e+03 1.02325e+04 1.02675e+04 2.11850e+04 2.12155e+04 2.44955e+04 7.42836e+05], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 14502 | 7 | 0.8% | |
| 18593 | 7 | 0.8% | |
| 20586 | 7 | 0.8% | |
| 16969 | 6 | 0.7% | |
| 10262 | 6 | 0.7% | |
| 23426 | 6 | 0.7% | |
| 9819 | 5 | 0.6% | |
| 21188 | 5 | 0.6% | |
| 4680 | 5 | 0.6% | |
| 21207 | 4 | 0.4% | |
| Other values (665) | 842 | 93.6% |
| Value | Count | Frequency (%) | |
| 345 | 1 | 0.1% | |
| 981 | 1 | 0.1% | |
| 1773 | 1 | 0.1% | |
| 1794 | 1 | 0.1% | |
| 1816 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 742836 | 1 | 0.1% | |
| 125421 | 1 | 0.1% | |
| 24520 | 2 | 0.2% | |
| 24471 | 1 | 0.1% | |
| 24454 | 2 | 0.2% |
Severity
Categorical
| Distinct count | 3 |
|---|---|
| Unique (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.2 KiB |
| 3 | |
|---|---|
| 1 | |
| 2 |
| Value | Count | Frequency (%) | |
| 3 | 498 | 55.3% | |
| 1 | 216 | 24.0% | |
| 2 | 186 | 20.7% |
Length
| Max length | 1 |
|---|---|
| Mean length | 1 |
| Min length | 1 |
| Value | Count | Frequency (%) | |
| Decimal_Number | 3 | 100.0% |
| Value | Count | Frequency (%) | |
| Common | 3 | 100.0% |
| Value | Count | Frequency (%) | |
| ASCII | 3 | 100.0% |
| Distinct count | 899 |
|---|---|
| Unique (%) | 99.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.2 KiB |
| Mr. Samuel Darnell | 2 |
|---|---|
| Master Leon Jesse | 1 |
| Mr. Frank Cedric | 1 |
| Master Kent Julian | 1 |
| Ms. Rosa Angie | 1 |
| Other values (894) |
| Value | Count | Frequency (%) | |
| Mr. Samuel Darnell | 2 | 0.2% | |
| Master Leon Jesse | 1 | 0.1% | |
| Mr. Frank Cedric | 1 | 0.1% | |
| Master Kent Julian | 1 | 0.1% | |
| Ms. Rosa Angie | 1 | 0.1% | |
| Mr. Nelson Rodney | 1 | 0.1% | |
| Miss Katrina Regina | 1 | 0.1% | |
| Mr. Pedro Johnny | 1 | 0.1% | |
| Master Caleb Ben | 1 | 0.1% | |
| Ms. Erma Kari | 1 | 0.1% | |
| Other values (889) | 889 | 98.8% |
Length
| Max length | 26 |
|---|---|
| Mean length | 17.03111111 |
| Min length | 11 |
| Value | Count | Frequency (%) | |
| Lowercase_Letter | 26 | 52.0% | |
| Uppercase_Letter | 22 | 44.0% | |
| Space_Separator | 1 | 2.0% | |
| Other_Punctuation | 1 | 2.0% |
| Value | Count | Frequency (%) | |
| Latin | 48 | 96.0% | |
| Common | 2 | 4.0% |
| Value | Count | Frequency (%) | |
| ASCII | 50 | 100.0% |
| Distinct count | 70 |
|---|---|
| Unique (%) | 9.7% |
| Missing | 177 |
| Missing (%) | 19.7% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1990.2669432918397 |
|---|---|
| Minimum | 1940.0 |
| Maximum | 2019.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.2 KiB |
Quantile statistics
| Minimum | 1940 |
|---|---|
| 5-th percentile | 1964 |
| Q1 | 1982 |
| median | 1992 |
| Q3 | 1999.5 |
| 95-th percentile | 2016 |
| Maximum | 2019 |
| Range | 79 |
| Interquartile range (IQR) | 17.5 |
Descriptive statistics
| Standard deviation | 14.52333493 |
|---|---|
| Coefficient of variation (CV) | 0.007297179396 |
| Kurtosis | 0.1772975931 |
| Mean | 1990.266943 |
| Median Absolute Deviation (MAD) | 11.31994207 |
| Skewness | -0.3969539685 |
| Sum | 1438963 |
| Variance | 210.9272575 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1996 | 31 | 3.4% | |
| 1998 | 28 | 3.1% | |
| 2002 | 27 | 3.0% | |
| 1990 | 26 | 2.9% | |
| 1999 | 25 | 2.8% | |
| 2001 | 25 | 2.8% | |
| 1992 | 25 | 2.8% | |
| 1995 | 24 | 2.7% | |
| 1984 | 22 | 2.4% | |
| 1991 | 22 | 2.4% | |
| Other values (60) | 468 | 52.0% | |
| (Missing) | 177 | 19.7% |
| Value | Count | Frequency (%) | |
| 1940 | 1 | 0.1% | |
| 1946 | 1 | 0.1% | |
| 1949 | 3 | 0.3% | |
| 1950 | 2 | 0.2% | |
| 1954 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 2019 | 14 | 1.6% | |
| 2018 | 10 | 1.1% | |
| 2017 | 6 | 0.7% | |
| 2016 | 10 | 1.1% | |
| 2015 | 4 | 0.4% |
| Distinct count | 7 |
|---|---|
| Unique (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.38 |
|---|---|
| Minimum | 0 |
| Maximum | 6 |
| Zeros | 685 |
| Zeros (%) | 76.1% |
| Memory size | 7.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 2 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.8032470256 |
|---|---|
| Coefficient of variation (CV) | 2.113807962 |
| Kurtosis | 9.850572357 |
| Mean | 0.38 |
| Median Absolute Deviation (MAD) | 0.5784444444 |
| Skewness | 2.756344643 |
| Sum | 342 |
| Variance | 0.6452057842 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 0.5 1.5 2.5 6. ], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 0 | 685 | 76.1% | |
| 1 | 120 | 13.3% | |
| 2 | 80 | 8.9% | |
| 5 | 5 | 0.6% | |
| 3 | 5 | 0.6% | |
| 4 | 4 | 0.4% | |
| 6 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 685 | 76.1% | |
| 1 | 120 | 13.3% | |
| 2 | 80 | 8.9% | |
| 3 | 5 | 0.6% | |
| 4 | 4 | 0.4% |
| Value | Count | Frequency (%) | |
| 6 | 1 | 0.1% | |
| 5 | 5 | 0.6% | |
| 4 | 4 | 0.4% | |
| 3 | 5 | 0.6% | |
| 2 | 80 | 8.9% |
| Distinct count | 7 |
|---|---|
| Unique (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.5211111111111111 |
|---|---|
| Minimum | 0 |
| Maximum | 8 |
| Zeros | 614 |
| Zeros (%) | 68.2% |
| Memory size | 7.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 3 |
| Maximum | 8 |
| Range | 8 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.09838535 |
|---|---|
| Coefficient of variation (CV) | 2.107775725 |
| Kurtosis | 18.02632118 |
| Mean | 0.5211111111 |
| Median Absolute Deviation (MAD) | 0.7110271605 |
| Skewness | 3.706736663 |
| Sum | 469 |
| Variance | 1.206450377 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 0.5 1.5 4.5 8. ], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 0 | 614 | 68.2% | |
| 1 | 212 | 23.6% | |
| 2 | 28 | 3.1% | |
| 4 | 18 | 2.0% | |
| 3 | 16 | 1.8% | |
| 8 | 7 | 0.8% | |
| 5 | 5 | 0.6% |
| Value | Count | Frequency (%) | |
| 0 | 614 | 68.2% | |
| 1 | 212 | 23.6% | |
| 2 | 28 | 3.1% | |
| 3 | 16 | 1.8% | |
| 4 | 18 | 2.0% |
| Value | Count | Frequency (%) | |
| 8 | 7 | 0.8% | |
| 5 | 5 | 0.6% | |
| 4 | 18 | 2.0% | |
| 3 | 16 | 1.8% | |
| 2 | 28 | 3.1% |
| Distinct count | 218 |
|---|---|
| Unique (%) | 24.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 895.7433333333333 |
|---|---|
| Minimum | 0 |
| Maximum | 14345 |
| Zeros | 15 |
| Zeros (%) | 1.7% |
| Memory size | 7.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 202 |
| Q1 | 221 |
| median | 405 |
| Q3 | 861.25 |
| 95-th percentile | 3108.35 |
| Maximum | 14345 |
| Range | 14345 |
| Interquartile range (IQR) | 640.25 |
Descriptive statistics
| Standard deviation | 1385.829926 |
|---|---|
| Coefficient of variation (CV) | 1.547128373 |
| Kurtosis | 33.69825684 |
| Mean | 895.7433333 |
| Median Absolute Deviation (MAD) | 783.4914593 |
| Skewness | 4.80874505 |
| Sum | 806169 |
| Variance | 1920524.585 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 56. 177.5 196.5 201. ... 1609.5 2584. 4456.5 7355.5 14345. ], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 221 | 44 | 4.9% | |
| 225 | 44 | 4.9% | |
| 364 | 42 | 4.7% | |
| 217 | 41 | 4.6% | |
| 728 | 31 | 3.4% | |
| 202 | 28 | 3.1% | |
| 294 | 25 | 2.8% | |
| 218 | 24 | 2.7% | |
| 222 | 18 | 2.0% | |
| 0 | 15 | 1.7% | |
| Other values (208) | 588 | 65.3% |
| Value | Count | Frequency (%) | |
| 0 | 15 | 1.7% | |
| 112 | 1 | 0.1% | |
| 140 | 1 | 0.1% | |
| 175 | 1 | 0.1% | |
| 180 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 14345 | 3 | 0.3% | |
| 7364 | 4 | 0.4% | |
| 7347 | 2 | 0.2% | |
| 6931 | 2 | 0.2% | |
| 6371 | 4 | 0.4% |
| Distinct count | 8 |
|---|---|
| Unique (%) | 4.0% |
| Missing | 702 |
| Missing (%) | 78.0% |
| Memory size | 7.2 KiB |
| C | |
|---|---|
| B | |
| E | |
| D | |
| A | |
| Other values (3) |
| Value | Count | Frequency (%) | |
| C | 57 | 6.3% | |
| B | 46 | 5.1% | |
| E | 31 | 3.4% | |
| D | 31 | 3.4% | |
| A | 15 | 1.7% | |
| F | 13 | 1.4% | |
| G | 4 | 0.4% | |
| T | 1 | 0.1% | |
| (Missing) | 702 | 78.0% |
Length
| Max length | 3 |
|---|---|
| Mean length | 2.56 |
| Min length | 1 |
| Value | Count | Frequency (%) | |
| Uppercase_Letter | 8 | 80.0% | |
| Lowercase_Letter | 2 | 20.0% |
| Value | Count | Frequency (%) | |
| Latin | 10 | 100.0% |
| Value | Count | Frequency (%) | |
| ASCII | 10 | 100.0% |
City
Categorical
| Distinct count | 3 |
|---|---|
| Unique (%) | 0.3% |
| Missing | 2 |
| Missing (%) | 0.2% |
| Memory size | 7.2 KiB |
| Santa Fe | |
|---|---|
| Albuquerque | |
| Taos | 80 |
| Value | Count | Frequency (%) | |
| Santa Fe | 649 | 72.1% | |
| Albuquerque | 169 | 18.8% | |
| Taos | 80 | 8.9% | |
| (Missing) | 2 | 0.2% |
Length
| Max length | 11 |
|---|---|
| Mean length | 8.196666667 |
| Min length | 3 |
| Value | Count | Frequency (%) | |
| Lowercase_Letter | 11 | 68.8% | |
| Uppercase_Letter | 4 | 25.0% | |
| Space_Separator | 1 | 6.2% |
| Value | Count | Frequency (%) | |
| Latin | 15 | 93.8% | |
| Common | 1 | 6.2% |
| Value | Count | Frequency (%) | |
| ASCII | 16 | 100.0% |
Deceased
Boolean
| Distinct count | 2 |
|---|---|
| Unique (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.2 KiB |
| 1 | |
|---|---|
| 0 |
| Value | Count | Frequency (%) | |
| 1 | 553 | 61.4% | |
| 0 | 347 | 38.6% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
First rows
| Patient_ID | Family_Case_ID | Severity | Name | Birthday_year | Parents or siblings infected | Wife/Husband or children infected | Medical_Expenses_Family | Medical_Tent | City | Deceased | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 4696 | 3 | Miss Linda Betty | NaN | 0 | 0 | 225 | NaN | Santa Fe | 1 |
| 1 | 2 | 21436 | 1 | Ms. Ramona Elvira | 1966.0 | 0 | 1 | 1663 | NaN | Albuquerque | 0 |
| 2 | 3 | 7273 | 3 | Mr. Mario Vernon | 1982.0 | 0 | 0 | 221 | NaN | Santa Fe | 1 |
| 3 | 4 | 8226 | 3 | Mr. Hector Joe | 1997.0 | 0 | 0 | 220 | NaN | Santa Fe | 1 |
| 4 | 5 | 19689 | 3 | Ms. Jennie Debra | 1994.0 | 0 | 0 | 222 | NaN | Santa Fe | 0 |
| 5 | 6 | 17598 | 2 | Master Terrell Bob | NaN | 0 | 0 | 0 | NaN | Santa Fe | 1 |
| 6 | 7 | 7563 | 3 | Mr. Kristopher Francis | 1984.0 | 0 | 1 | 435 | NaN | Santa Fe | 1 |
| 7 | 8 | 9520 | 2 | Mr. Lorenzo Bennie | 1989.0 | 0 | 0 | 364 | NaN | Santa Fe | 0 |
| 8 | 9 | 6314 | 3 | Mr. Rickey Dennis | 2000.0 | 1 | 1 | 441 | NaN | Albuquerque | 0 |
| 9 | 10 | 14392 | 3 | Miss Elena Cathy | NaN | 1 | 1 | 626 | F | Albuquerque | 0 |
Last rows
| Patient_ID | Family_Case_ID | Severity | Name | Birthday_year | Parents or siblings infected | Wife/Husband or children infected | Medical_Expenses_Family | Medical_Tent | City | Deceased | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 890 | 891 | 1907 | 1 | Mr. Ronnie Hugo | 1992.0 | 0 | 0 | 743 | C | Santa Fe | 0 |
| 891 | 892 | 742836 | 3 | Mr. Dawson Beil | 1985.0 | 0 | 0 | 219 | NaN | Taos | 1 |
| 892 | 893 | 125421 | 3 | Ms. Shayan Meyer | 1973.0 | 0 | 1 | 196 | NaN | Santa Fe | 0 |
| 893 | 894 | 345 | 2 | Mr. Adam Donovan | 1958.0 | 0 | 0 | 271 | NaN | Taos | 1 |
| 894 | 895 | 9846 | 3 | Mr. Noel Mcdougall | 1993.0 | 0 | 0 | 243 | NaN | Santa Fe | 1 |
| 895 | 896 | 6253 | 3 | Ms. Linda Wilcox | 1998.0 | 1 | 1 | 344 | NaN | Santa Fe | 0 |
| 896 | 897 | 6483 | 3 | Mr. Haiden Vance | 2006.0 | 0 | 0 | 258 | NaN | Santa Fe | 0 |
| 897 | 898 | 981 | 3 | Miss Anaiya Love | 1990.0 | 0 | 0 | 214 | NaN | Taos | 1 |
| 898 | 899 | 16418 | 2 | Mr. Robert Williams | 1994.0 | 1 | 1 | 812 | NaN | Santa Fe | 0 |
| 899 | 900 | 3782 | 3 | Ms. Marjorie Hays | 2002.0 | 0 | 0 | 202 | C | Albuquerque | 0 |